By @carnby.
This notebook showcases the basic matta visualizations, as well as their usage.
Note that the init_javascript call is not needed when running on local server having added the javascript code to your IPython profile.
In [1]:
import pandas as pd
import networkx as nx
import matta
import json
import requests
from networkx.readwrite import json_graph
# we do this to load the required libraries when viewing on NBViewer
matta.init_javascript(path='https://rawgit.com/carnby/matta/master/matta/libs')
Out[1]:
Wordclouds are implemented using the d3.layout.cloud layout by Jason Davies. They work with bags of words. The python Counter class is perfect for this purposes.
In [2]:
hamlet = requests.get('http://www.gutenberg.org/cache/epub/2265/pg2265.txt').text
hamlet[0:100]
Out[2]:
In [3]:
import re
from collections import Counter
words = re.split(r'[\W]+', hamlet.lower())
counts = Counter(words)
In [4]:
df = pd.DataFrame.from_records(counts.iteritems(), columns=['word', 'frequency'])
df.sort_values(['frequency'], ascending=False, inplace=True)
df.head()
Out[4]:
In [5]:
matta.wordcloud(dataframe=df.head(500), text='word', font_size='frequency',
typeface='Helvetica', font_weight='bold',
font_color={'value': 'frequency', 'palette': 'cubehelix', 'scale': 'threshold'})
Treemaps use the Treemap Layout from d3.js. They work with trees, which we construct through networkx.DiGraph.
In [6]:
flare_data = requests.get('https://gist.githubusercontent.com/mbostock/4063582/raw/a05a94858375bd0ae023f6950a2b13fac5127637/flare.json').json()
In [7]:
flare_data['name']
Out[7]:
In [8]:
tree = nx.DiGraph()
def add_node(node):
node_id = tree.number_of_nodes() + 1
n = tree.add_node(node_id, name=node['name'])
if 'size' in node:
tree.node[node_id]['size'] = node['size']
if 'children' in node:
for child in node['children']:
child_id = add_node(child)
tree.add_edge(node_id, child_id)
return node_id
root = add_node(flare_data)
# treemap requires this attribute
tree.graph['root'] = root
In [9]:
nx.is_arborescence(tree)
Out[9]:
In [10]:
import seaborn as sns
In [11]:
matta.treemap(tree=tree, node_value='size', node_label='name',
node_color={'value': 'parent.name', 'scale': 'ordinal', 'palette': sns.husl_palette(15, l=.4, s=.9)})
Sankey or flow diagrams use the Sankey plugin by Mike Bostock. They work with digraphs, just like treemaps. Note that graphs with loops are not supported.
In [12]:
sankey_data = requests.get('http://bost.ocks.org/mike/sankey/energy.json')
In [13]:
sankey_graph = json_graph.node_link_graph(json.loads(sankey_data.text), directed=True)
In [14]:
sankey_graph.nodes_iter(data=True).next(), sankey_graph.edges_iter(data=True).next()
Out[14]:
In [22]:
matta.flow(graph=sankey_graph, node_label='name', link_weight='value', node_color='indigo',
node_width=12, node_padding=13,
link_color={'value': 'value', 'palette': 'Greys', 'scale': 'threshold'}, link_opacity=0.8)
Parallel Coordinates are based on the code by Jason Davies. They work with pandas.DataFrame.
In [23]:
df = pd.read_csv('http://bl.ocks.org/jasondavies/raw/1341281/cars.csv', index_col='name')
df.head()
Out[23]:
In [24]:
matta.parcoords(dataframe=df)
In [25]:
df = pd.read_csv('https://www.jasondavies.com/parallel-sets/titanic.csv')
df.head()
Out[25]:
In [27]:
matta.parsets(dataframe=df, columns=['Survived', 'Sex', 'Age', 'Class'])
Graphs from networkx.DiGraph are visualized using the Force Layout in d3.js.
In [28]:
graph = nx.davis_southern_women_graph()
In [29]:
for node in graph.nodes_iter(data=True):
graph.node[node[0]]['color'] = 'purple' if node[1]['bipartite'] else 'green'
graph.node[node[0]]['size'] = graph.degree(node[0])
In [30]:
matta.force(graph=graph, link_distance=100, height=600,
node_ratio='size',
node_color={'value': 'bipartite', 'scale': 'ordinal', 'palette': 'Set2'})
In [ ]: